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Abstract — There have been emerging lots of applications for 
distributed storage systems e.g., those in wireless sensor networks 
or cloud storage. Since storage nodes in wireless sensor networks 
have limited battery, it is valuable to find a repair scheme with 
optimal transmission costs (e.g., energy). The optimal-cost repair 
has been recently investigated in a centralized way. However 
a centralized control mechanism may not be available or is 
very expensive. For the scenarios, it is interesting to study 
optimal-cost repair in a decentralized setup. We formulate the 
optimal-cost repair as convex optimization problems for the 
network with convex transmission costs. Then we use primal 
and dual decomposition approaches to decouple the problem 
into subproblems to be solved locally. Thus, each surviving node, 
collaborating with other nodes, can minimize its transmission 
cost such that the global cost is minimized. We further study 
the optimality and convergence of the algorithms. Finally, we 
discuss the code construction and determine the field size for 
finding feasible network codes in our approaches. 

I. Introduction 

Wireless sensor networks consist of several small devices 
(e.g., sensors) which measure or detect a physical quantity 
of interest e.g., temperature, dust, light and so on. The main 
characteristics of these sensors are on limited battery, low CPU 
power, limited communication capability and small memory 
Uj. These nodes are often vulnerable. Thus to make the data 
reliable over these unreliable node, the data can be encoded 
and distributed among small storage devices |TJ, J3], J3J. When 
a storage node fails, to maintain the reliability of systems, an 
autonomous algorithm should regenerate a new storing node. 
The process is generally known as repair. Repair process will 
cause traffic and transmission cost. The repair process with 
the aim of minimizing traffic leads to the proposal of optimal 
bandwidth (traffic) regenerating codes (0. The repair with 
the objective of minimizing transmission costs leads to the 
minimum-repair-cost regenerating codes in e.g., [£Q. 

The regenerating code (|2| in a distributed storage system 
with n nodes is actually a type of erasure codes by which any 
k (k ^ n) out of n nodes can reconstruct the original file. 
This property, called the regenerating code property (RCP), 
is desirable since it is optimal in providing reliability using 
a given amount of storage. In the repair process, the new 
node may not have the same coded symbols as the lost node. 
However it preserves the RCP. This type of repair is known 
as functional repair. Reference Q also models distributed 
storage systems and the repair process by an acyclic directed 
graph, namely, information flow graph. The graph involves 
three types of nodes: a source node, storage nodes, and a 



data collector. When a node fails, surviving nodes send 7 
bits of coded symbols to the new node. Cut analysis on 
the information flow graph shows the fundamental storage- 
bandwidth tradeoff. In 0, it is shown that the tradeoff can 
be achieved by deterministic/random linear network codes 
(El)- In If I an d [2], decentralized approaches for erasure code 
construction has been proposed respectively based on fountain 
code and random linear network coding. 

Reference [4] seeks to minimize repair-cost with the RCP 
preserved. Furthermore, surviving node cooperation (SNC) is 
also proposed in (4). That is, a surviving node can combine 
the data from other surviving nodes and its own data. The 
transmission cost is optimized for linear costs with a central 
controlling way. Here we shall study the process of optimal- 
cost-repair in a decentralized method. The scenario is inter- 
esting when the central control is difficult or expensive. For 
instance, a centralized control in distributed storage in wireless 
sensor networks is difficult or even impossible. To achieve 
a decentralized method in minimum-cost repair, we first 
formulate problems as convex optimization problems. Then 
we study decentralized methods for finding an optimal-cost 
subgraph decoupled from code construction. For the purpose, 
we present two distributed algorithms based on primal and 
dual decomposition. With the minimum-cost subgraph, we 
show that there exists a code over a finite field to regenerate 
the new node properly. 

The rest of the paper is organized as follows. We formulate 
the minimum-cost repair problem in Section [EI] Then, Section 
iHIl presents primal and dual decomposition algorithms for 
finding minimum-cost repair subgraph in a distributed way. 
We discuss in Section |IV] the issue of the code construction 
and required field sizes. 

II. Problem Formulation 

Consider a network with n nodes. There are paths con- 
necting nodes. We denote the transmission cost from node 
i to node j by function fy. We only consider the convex 
cost. Thus, if Zij is the number of bits (packets) transmitting 
from node i to j, is a convex function of z^. We assume 
that each node knows the cost of links to its neighbor in the 
network. For simplicity, we assume that the network is delay- 
free and acyclic. In what follows, we first present the modified 
information flow graph to analyze the repair process. 
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Fig. 1. Modified information flow graph for a four-node tandem network. 
There are directed channels connecting node 1 to node 2, node 2 to node 3, 
and node 3 to node 4, respectively. Node 4 fails and node 5 is the new node. 



A. Modified Information Flow Graph 

Consider a storage system with the source original file of 
size M distributed among n nodes in which each node stores a 
units and any k out of n nodes can rebuild the original file. We 
denote the source file with an M x 1 vector s. Then, the code 
on node i can be evaluated by a matrix Q = (?*,••• ,q?) 
of size M x a where each column (q?) represents the code 
coefficients of fragment j on node i. The stored data in node 
i is 

X t = Q T s. Then we can denote the flow of information 
(and topology of networks) in a distributed storage system by a 
directed acyclic graph denoted as G(n, k, a) = (N, A), where 
N is the set of nodes and A is the set of directed links. 

Similar to 0, graph G(n, k, a) consists of three different 
types of nodes: a source node, storage nodes and data collector 
(DC). The source node contains the original file which is go- 
ing to be distributed among storage nodes; The storage nodes 
consists of two kinds of nodes, namely, in and out nodes with 
a link of capacity a (the storage size) between them; The data 
collector can reconstruct the original file by connecting to k 
out nodes. Yet different from J2], the modified flow graph 
shall reflect the topology of the network. Thus, there might 
not exist direct channels (edges in G) from a surviving node 
to the new node. A storage node may have to forward the data 
of other nodes to the new node, depending on the network 
topology. When a node fails, all the surviving nodes (n — 1 
nodes) can join the repair process. An optimization algorithm 
shall determine the optimal traffic on the links and hence 
the number of nodes for repair. An example of the modified 
information flow graph for a distributed storage system of a 
four-node tandem network is given in Fig. Q] where node 4 
fails and node 5 is the new node. 

For analysis, we use a column vector to denote the number 
of fragments transmitted on the links of the network. The 
vector is termed as subgraph (z_ = [ z (ij)]\ {ij)eA )- For a given 
network, our objective is to minimize the cost (<r c ) during the 
repair process. With the subgraph z = [^(ij)]| ( y )SA > an d cos t 



function fy, the repair cost is 

(ij)eA 

B. Constraint Region 

In the repair process, it is required that any k nodes can 
reconstruct the original file. This property is known as the 
regenerating code property (RCP). In the literature, the process 
that a node fails and a new node is regenerated is called a stage 
of repair. The RCP must be preserved in any stage of repair. 
Thus, in the repair process we should have the RCP for the 
system with the new node and surviving nodes. Hence, any 
cut in the modified information graph must not be less than 
M, i.e., the original file size. The requirement is called the cut 
constraint. Thus, we should find the minimum er c under the cut 
constraints. Since there are multiple cuts in the networks, there 
will be multiple cut constraints. If we assume R constraints, 
the constraints represent the feasible region in our problem. 
We call the region polytope which can be denoted by the 
following R linear inequalities, 

h kMv)) <0forr=l,--- ,R, (2) 

(ij)eA 

where ftf^s (z^ y) ) is an affine function of in the r-th 
constraint. 

The polytope 'J? is restricted by linear inequalities. Hence, if 
Z(ij)S are real numbers then the constraint region 'J? is convex. 
We can reasonably assume that zr^s are real numbers. Note 
that the file is measured by bits but it is normally quite 
large. Thus we can consider Zuj\ real valued. Following this 
assumption, ^ constitutes a convex region. Since the constraint 
region is convex, whenever the cost function is convex, the 
problem is convex. 

C. Convex Optimization 

With the constraint region and objective function, we can 
formulate the optimization problem as follows, 

minimize E(<,-) € a /«(*(«)) 
subject to J2(ij)eA h h) (*(«) ) ^ (3) 
for r — 1, • • • , R, 
z m >0. 

Problem (01 can be solved centrally as in [4] if there is 
a central control mechanism. Consequently the optimal cost 
subgraph can be found. Without central control schemes, we 
can find the optimal cost subgraph in a decentralized manner 
as follows. Corresponding to the minimum cost subgraph for 
a = M/k, we can also find a decentralized coding scheme 
(e.g., random linear network codes) for the repair satisfying 
RCP (to be shown in Section HVb. 

III. Minimum-Cost Subgraph by Decentralized 
Algorithms 

We first show problem (0 can be separated to (n — 1) 
subproblems. To decouple the problem into subproblems, we 
apply primal and dual decomposition methods (9J. These 



approaches lead us to distributed algorithms of finding the 
optimal-cost repair subgraph. Further we analyze their prop- 
erties and evaluate their performance. 

A. Primal Decomposition 

The cost function of problem © can be decoupled into 
n — 1 parts, each associated to a surviving node. Then every 
node solves an optimization problem locally and a master 
node coordinates the problem solving (we shall show that this 
master problem can be solved in a decentralized way with 
communication between nodes). Without loss of generality, 
we assume node 1 fails. For decomposition, we rewrite the 
problem © as the following form, 

minimize £,"=2 2~2j\(ij)eA /«(*(«)) 
subject to Er=2 T, jmeA h \ij) ( z m ) ^ (4) 
for r = 1, • • • , R, 
z m >0. 

Then, using primal decomposition with a constraint (j9), 
each nodes minimizes its transmission cost by, 

minimize £{,-|(<j)eA} /«(*(«)) 
subject to EtfKtfjeJl} (5) 
for r — 1, • • • . R. 
z (l3) >0. 

Finally the following master problem iteratively update 
parameters: t\, ■ ■ ■ , tf ■ ■ ■ , ■ ■ ■ ,t* 

minimize <j) = fa{t\,tl, ■■■ ,tf) + 
■ ■ • + 1 1^, ■ ■ ■ , if), 
subject to t r 2 + ... + t r s + ---+t 1 n =Q, (6) 

where for each node, (j>i(tl,tf, ■ ■ ■ , if) is calculated using the 
Lagrange dual function, associating \\ , • • • , Af as Lagrangian 
variables of R inequality constraints in subproblem i, as 

••>*?) = sup inf Ji{z(ij)) 

-\\(h\{z {ij) ) -t\) Af {hf (z m ) - tf). (7) 

We can relax the constraint in © by setting t r n = —(t r 2 + ... 
<3 + • • • + in subproblem n. Thus, the gradient of 

function 0(4 •••,*?,&•••,*£,•••, t*_ x ) in Q is 

A P = (A2 - A*, ■ • • , Af ; — Af , • ■ ■ , A(J l _ 1 - ) - Af ). (8) 

Therefore, the iterative algorithm is 

Algorithm l:Primal iterative algorithm 
Repeat: 

1) Every node solves a subproblem 

Node i, for 2 < i < n, solves the subproblem ||5), 
finding z {ij)litj)eA and (A, 1 , • ■ ■ , Af ). 

2) Update vector i = (4, ■ • • , i* 4, • • • , tf, . • . , 

t := t — ctk Ap, where ajt is the iteration step length. 
Until: The stopping criterion (as follows) is satisfied. 
The algorithm can be stopped after passing T (pre-defined) 
iterations for delay sensitive conditions or after achieving 



certain level of accuracy (e.g., ||er c (fc) — a c (k— 1)| < e, where 
e is small and positive). The properties of Algorithm 1 are 
discussed as follows. 

1 ) Optimality: We know problem (|4]i has feasible solutions 
(by e.g., simply assigning zuj\ = M all the cut constraints are 
satisfied). According to |9|, problem and the decomposed 
problem are equivalent. Hence, as long as the convergence of 
the decomposed problem is proved, it converges to the optimal 
solution. 

2) Convergence: 

Proposition 1: For the decomposed problems (0, ©, Al- 
gorithm 1 converges to the optimal solutions. 
Proof: The proof is similar to that in (0. 

3) Implementing Algorithm 1 in a decentralized way: It 
seems that Algorithm 1 is still not fully decentralized since 
a node is needed to solve the master problem. However, by 
checking the master updating equation, we see that the equa- 
tion can be broken into n — 1 parts if nodes can communicate 
to each other. That is, 

Ap = (Ap3,--- ,A P „), (9) 

where for node i, < i < (n — 1), 

A P i = (A,- — A* , — A^, • • • , Af — Af ). (10) 

Consequently, at the end of each iteration, node i, receives 
(A* , • • • , Af ) and updates its master equation as, 

t\ = t r i -a k A pi (r), forr = l,--- ,R. (11) 

Node i also sends the updated results to node n. Since we 
assume there exists a path between any pair of nodes, nodes 
can thus communicate and update their master equations. 

B. Dual Decomposition 

For dual decomposition, we can compute the dual function 
of the optimization problem (|4j, and then decouple the prob- 
lem into (n — 1) subproblems as follows 

9 (A, z) = 

n n 

J2 /y( z fe))- Al (H J2 %)( z fe))) 

*= 2 {i\(ij)eA} i=2 U\(ij)eA} 

n 

n R 

»=2 UimeA} r=i y\(ij)eA} 

where A 1 ,-- - , X R ) are associated Lagrangian variables of 
R inequalities in problem ||3}. Therefore, the optimization 
problem can be solved distributed by (n— 1) surviving nodes, 
where node i, 2 < i < n, solves the following problem 

g(X) = min Zzjl{t3)eA Y,{j\(ij)eA} f(v)( z (v)) ~ Ef=i 

Ar E{ 3 |(y)eA} h \ij)( Z (ij))) 

(12) 



Vector A = (A 1 , ■ ■ • ,X R ) is updated after each iteration in 
order to minimize the duality gap by 
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maxAg(A). 



(13) 



Since the gradient of q(X) with respect to the variable A r is: 



dg_ 



E h h)^3)j 

1=2 {j|(«')eA} 

the iterative algorithm is 

Algorithm 2:Dual iterative algorithm 
Repeat: 

1) Every node solves a minimization problem (TTZt . result- 
ing in z^j^ij) eA . 

2) Update vector A = (A 1 , • • • , \ R ) 

A r := A r + a.kg' r , where is iteration step length. 
Until: The stopping criterion (as Algorithm 1) is satisfied. 
We discuss the properties of Algorithm 2 as follows, 

1 ) Optimality: For a convex cost function, since the con- 
straints in problem (|4) are non-strict linear inequalities, then 
the refined Slater condition is satisfied [8|. Therefore, strong 
duality holds for any convex cost in the problem (@J. 

2) Convergence: Since Algorithm 2 uses a gradient 
method, it is straightforward to show the convergence |]8]- 

3) Implementing Algorithm 2 in a decentralized way: 
Similar to Algorithm 1, the update equation can be decoupled 
to (n — 1) parts. 

C. Numerical results 

For illustration, we apply the decentralized algorithms for 
a 4-node tandem network in Fig. Q] and a 2 x 3 grid networks 
in Fig. |2 Then, we numerically compare their convergence 
behavior. First, we use the distributed algorithms on a repair 
process of the distributed storage system in Fig. Q] Consider 
a source file of size M = 4 packets is distributed among 4 
nodes such that any k = 2 nodes can recover the original file. 
Assume transmission between neighboring nodes leads to one 
unit cost {fij(z(ij}) — zuj\). If node 4 fails, the optimization 
problem is formulated as follows, 



minimize 



z (23) T Z(35) 



subject to 



(15) 



f{z)=Z(12) 
'*(85) 
Z (23) 
^(12) +2(35) 

If the problem can be solved centrally, the optimal approach 
can regenerate the new node with 4 units of transmission 
costs as in J4). Fig. [3] compares the result of distributed 
algorithms by primal and dual decomposition when = 
0.5/Vk. We can see that the primal approach has very low 
convergence speed. The dual algorithm converges very fast 
to the optimal value (of the centralized approach). However, 
the convergence property may vary for different networks. 
Consider the example in Fig. [2] We assume M = 8 packets 
are distributed among 6 nodes in the grid network such that 
any 4 nodes can reconstruct the original file. As shown in Fig. 
21 the dual algorithm converges slowly to the optimal value 
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Fig. 2. Optimization repair in the 2x3 grid network. Each solid line 
represents transmission of one packet. Dashed lines show available links which 
are not used in the repair process. 



Convergence of primal and dual algorithms in 4-node tandem network 
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Fig. 3. Distributed algorithms for finding optimal cost repair in 4-nodes 
tandem network, = 0.5/Vk 



of the centralized approach. Primal decomposition has faster 
convergence in this network comparing to the dual algorithm. 
This difference might stem from the difference in their network 
structure. 



Convergence of primal and dual algorithms in 2 x 3— grid network 
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Fig. 4. Distributed algorithms for finding optimal-coat repair in 2 X 3 grid 
network, = 0.5 /y/k. 



IV. Decentralized Optimal-Cost Minimum Storage 
Regenerating (OC-MSR) Code Construction 

In this section, we illustrate how to construct the regenerat- 
ing code corresponding the optimal cost subgraph in Section 
Hill In 0Q,|3), decentralized code for distributing data among 
storage nodes have been suggested based on rateless fountain 
code and linear network coding. In the repair problem, an 
optimum bandwidth code has been suggested by Wu J5). 
Subsequently, the author in (5) finds the the sufficient finite 
field size for the linear code. We try to find the optimum-cost 
minimum storage regenerating (MSR) code. Here MSR means 
that a = M/k. Consequently we find the required finite field 
size for the linear code which regenerates the new node having 
RCP property (with high probability). 

To formulate the problem, suppose there is a source file of 
size M which is divided into k(n — k) fragments and coded 
with a regenerating code (satisfying the RCP) to n(n — k) 
fragments. The code blocks are distributed among n nodes 
(Q-p Q 2 ' ' " ' *2 )• Every node stores a — M/k — (n — k) 
fragments with the code (Q. = [q 1 , q 2 , ■ ■ ■ ,qY l ~ k ']) where 
qj g _ wh en a node fails (say, Q fails) the optimization 
algorithm finds the minimum-cost subgraph. Using random 
network coding from a proper finite field guarantees the 
regeneration of the new node (Q^ satisfying the RCP. As 
proof, we have Lemma Q] as follows. 

Lemma 1: In the repair process of node 1 described by 
optimization problem (@}, for any selection of k — 1 surviving 
nodes (Q ,Q ), there exist code coefficients in which 

— si — Sk-l 

matrix [Q 1 ,Q s , ■ ■ ■ , Q g ] has full rank. That is, 

J] det([Q' 1 ,Q 5i ,-.- ,Q Sk J) #0. (16) 

Si,— ,Sfc_iC2,— ,n 

Proof: Own to space limitation, we skip the proof here. ■ 
In the optimal-cost repair, surviving nodes are allowed 
to cooperate (SNC) in order to reduce the cost as in @. 
Using SNC, network coding is also used in intermediate 
storing nodes. The coding process may increase the degree 
of new node's polynomial considering the determinant of 
coding variables ifTOl . The maximum degree of the new node 
polynomial is determined by the maximum number of times 
that a network coding process is used for a specific fragment. 
We denote this number as n nc . For instance, n nc = 2 in a 
scenario that there exist direct links from surviving nodes to 
new node (2), 0; one step of coding in surviving node and 
another in new node. And, in general n nc > 2 in multi-hop 
structure using SNC, since intermediate nodes as well perform 
network coding on their received fragments. Thus, for a more 
general scenario, we have the following result. 

Theorem 1: For a distributed storage system with parame- 
ters G(n, k, a), and a source file of size M, if the finite field 
is greater than do, there exists a linear network coding such 
that at any stage, the RCP is satisfied, regardless of how many 
failures/repairs happened before, where do = \l)Mn nc . 

Proof: The proof is similar to the proof in 0. ■ 



With the sufficient field size, the network codes can be 
easily constructed by e.g., the random linear network coding 
approach [7|. In summary, OC-MSR codes can be given in two 
steps. First, the optimal-cost subgraph is found. It is decoupled 
from coding. Then, to construct the code of the new node, 
network coding coefficients are chosen (e.g., randomly) from 
a sufficiently large finite field (specified by Theorem [TJ so that 
the probability of regenerating the new node satisfying RCP 
would be close to 1. 

V. Conclusion 

We study a decentralized approach for optimal-cost repair 
in a distributed storage system. We formulate the decentralized 
optimum-cost problems as a convex optimization problems 
for the network with convex transmission costs. Primal and 
dual decomposition approaches are used to decouple the 
problem into subproblems to be solved locally. We further 
study the convergence properties of the algorithms. Numerical 
results show that for tandem network, dual decomposition 
has much faster convergence and for grid networks, primal 
decomposition is faster. Finally, we discuss the construction 
of the optimal cost regenerating codes and discuss the field 
size of the codes. 
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